Abstract: Information retrieval is a tricky process, especially when the application contains large amount of data to be processed. This may require computing the indices of the records, processing of the records, placing the records at its proper place based on some hashing techniques and so on. Based on the type of processing required, the retrieval process can also be categorized into the data and compute intensive storage and/or retrieval. Data-intensive and compute intensive systems encompass terabytes to petabytes of data. They require massive storage and intensive computational power in order to execute complex queries and generate timely results. In addition to this, the pace at which the data is growing all over the world, adds fuel into the fire. Retrieving data from such a large volume is like finding a needle from a haystack. This becomes more difficult when the data is stored on clouds and when data is to be migrated from one place to another in order to fulfill a request or provide some service. Such migration requires optimum resource selection from the appropriate neighbor to provide the service to the end user. This paper proposes resource selection technique using K-means Clustering and Ant Colony Optimization for data intensive applications where data is to be migrated from one source to the other.

Keywords: Cloud Computing, VM Allocation, Virtual Machine, CloudSim, Clustering.